Error Localization and Implied Edit Generation for Ratio and Balancing Edits
نویسنده
چکیده
The U.S. Census Bureau has developed SPEER software that applies the Fellegi-Holt editing method to economic establishment surveys under ratio edit and a limited form of balancing. It is known that more than 99% of economic data only require these basic forms of edits. If implicit edits are available, then Fellegi-Holt methods have the advantage that they determine the minimal number of fields to change (error localize) so that a record satisfies all edits in one pass through the data. In most situations, implicit edits are not generated because the generation requires days-to-months of computation. In some situations when implicit edits are not available Fellegi-Holt systems use pure integer programming methods to solve the error localization problem directly and slowly (1-100 seconds per record). With only a small subset of the needed implicit edits, the current version of SPEER (Draper and Winkler 1997, upwards of 1000 records per second) applies ad hoc heuristics that finds error-localization solutions that are not optimal for as much as five percent of the editfailing records. To maintain the speed of SPEER and do a better job of error localization, we apply the Fourier-Motzkin method to generate a large subset of the implied edits prior to error localization. In this paper, we describe the theory, computational algorithms, and results from evaluating the feasibility of this approach.
منابع مشابه
Implied Edit Generation and Error Localization for Ratio and Balancing Edits
The U.S. Census Bureau has developed SPEER software that applies the Fellegi-Holt editing method to economic establishment surveys under ratio edit and a limited form of balancing. It is known that more than 99% of economic data only require these basic forms of edits. If implicit edits are available, then Fellegi-Holt methods have the advantage that they determine the minimal number of fields ...
متن کاملGenerating, Locating, and Applying Systematic Edits by Learning from Example(s) Ph.D. Proposal
Programmers make systematic edits—similar, but not identical changes to multiple places during software development and maintenance. Finding all the correct locations and making correct edits is a tedious and error-prone process. Existing tools for automating systematic edits are limited because they do not support edit generation, edit location suggestion, or edit application at the same time,...
متن کاملRecognizing Textual Entailment for Italian EDITS @ EVALITA 2009
This paper overviews FBK’s participation in the Textual Entailment task at EVALITA 2009. Our runs were obtained through different configurations of EDITS (Edit Distance Textual Entailment Suite), the first freely available open source tool for Recognizing Textual Entailment (RTE). With a 71% Accuracy, EDITS reported the best score out of the 8 submitted runs. We describe the sources of knowledg...
متن کاملExtending the Fellegi-Holt Model of Statistical Data Editing
This paper provides extensions to the theory and the computational aspects of the Fellegi-Holt Model of Editing (JASA 1976). If implicit edits can be generated prior to editing, then error localization (finding the minimum number of fields to impute) can be quite rapid. In some situations, not all of the implicit edits can be generated because of the great number (> 10^30) of distinct edit patt...
متن کاملAutomatically Classifying Edit Categories in Wikipedia Revisions
In this paper, we analyze a novel set of features for the task of automatic edit category classification. Edit category classification assigns categories such as spelling error correction, paraphrase or vandalism to edits in a document. Our features are based on differences between two versions of a document including meta data, textual and language properties and markup. In a supervised machin...
متن کامل